feat(observability): emit failed_stage and failed_stage_delta_ms on pipeline.failed#244
Conversation
…ipeline.failed pipeline.failed carried only err + pipeline_wall_clock_ms, so an operator had to scan back for the last pipeline.stage line and know the static stage order to guess which stage threw. Add a StageTracker threaded as an optional 4th arg through the 7 timeStage calls + the prompt.build site: timeStage sets it on entry, clears on success, and leaves it set on throw, so the outer catch attributes failed_stage + failed_stage_delta_ms. The finalize inner catch clears the tracker so a later throw is not mis-attributed. New strict PipelineFailedLogSchema mirrors PipelineCompletedLogSchema. Additive only. Closes #226 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_015v2bspPF1ZTTG9ZZrbBgA1
|
Warning Review limit reached
More reviews will be available in 52 minutes and 49 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds ChangesStage Attribution for Pipeline Failures
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR improves failure-path observability for the core execution pipeline by emitting stage attribution on the terminal pipeline.failed log line, so operators can immediately see which stage threw and how long it was running before failing.
Changes:
- Introduces a
StageTrackerand threads it throughrunPipeline+timeStageto capture the currently active stage at the moment of failure. - Adds
failed_stageandfailed_stage_delta_mstopipeline.failed(when applicable) and documents the new fields. - Adds a strict
PipelineFailedLogSchemaand corresponding tests to guard against future log-field drift.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/core/log-fields.ts |
Adds StageTracker + createStageTracker(), extends timeStage() with optional tracking, and introduces PipelineFailedLogSchema. |
src/core/pipeline.ts |
Creates and threads a stage tracker through stages; emits failed_stage + failed_stage_delta_ms on pipeline.failed. |
test/core/log-fields.test.ts |
Adds schema drift tests for pipeline.failed and behavior tests for timeStage tracking behavior. |
docs/operate/observability.md |
Documents the new pipeline.failed fields. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/core/log-fields.ts`:
- Around line 74-81: The PipelineFailedLogSchema object currently allows
failed_stage and failed_stage_delta_ms to be independently optional, but they
must always appear together or not at all according to the documented contract.
Add a validation constraint to the PipelineFailedLogSchema using Zod's refine
method to enforce that either both fields are present (defined) or both are
absent (undefined), rejecting any state where only one field is provided.
In `@src/core/pipeline.ts`:
- Around line 592-595: In the catch block for the cleanup error (the block
starting at line 593 that catches cleanupError), after logging the error
message, clear the stageTracker.active field similar to the pattern already used
in the trackingComment.finalize catch block around lines 510-513. This ensures
that when cleanup fails and the error is suppressed, the stageTracker state is
reset so the outer catch handler doesn't misattribute the original failure to
the "workspace.cleanup" stage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9f4cdf94-e07f-4578-9bee-d568ca675527
📒 Files selected for processing (4)
docs/operate/observability.mdsrc/core/log-fields.tssrc/core/pipeline.tstest/core/log-fields.test.ts
…Schema Address review: failed_stage and failed_stage_delta_ms are emitted together or not at all, so a .refine rejects a record carrying only one. Add paired- contract regression tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_015v2bspPF1ZTTG9ZZrbBgA1
…trict schemas Bundles 12 area:observability issues into one change. Each family ships a Zod-strict *-log-fields.ts module with a co-located drift-prevention test, following the #242/#244 pattern: structured_output, circuit, digest, github.api.slow, github.app.token.mint, http, scheduler.scan, workflow.run, workspace, agent.tool, daemon.connection, k8s.spawn. Behavioral changes beyond logging: #236 routes 6 mint sites through a new mintInstallationToken helper (exact cache_hit via scoped hook.before); #218 removes unused onConnected/onDisconnected WsClientOptions callbacks; #223 replaces octokit hook.after/hook.error with one hook.wrap timing closure (new GITHUB_API_SLOW_REQUEST_MS env, default 3000). Security: errSerializer now strips octokit event/payload/signature carriers so the raw webhook body + HMAC signature cannot leak through err: log lines (regression-tested). Docs: 12 observability.md sections + alerts, configuration.md env var, daemon-fleet.md k8s.spawn diagnostics. Closes #216 #217 #218 #223 #228 #233 #234 #235 #236 #237 #243 #247 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Wjb51MuzTGGiJ2ggGX2DhH
What changed and why
pipeline.failedpreviously carried onlypipeline_wall_clock_ms+err, giving operators no signal about which stage threw or how long it had been running — they had to scroll back throughpipeline.stagelines and mentally reconstruct the static stage order to diagnose a failure.This adds a
StageTrackercursor torunPipeline. Each of the seventimeStagecalls and the inlineprompt.buildsite receive it as an optional fourth argument: on stage entry the tracker is set, on success it is cleared, and on throw it is left intact. The outer catch readsstageTracker.activeand conditionally spreadsfailed_stage+failed_stage_delta_msinto thepipeline.failedline. Both fields are omitted when no timed stage was in flight (an early failure before any stage, or after all stages succeeded).One correctness detail: the
trackingComment.finalizeinner catch swallows its error and continues, sostageTracker.activeis explicitly cleared there — otherwise a subsequent throw would be mis-attributed to finalize.The change is additive: success path, control flow, and return values are unchanged.
timeStage's tracker argument is optional, so every existing caller stays valid. New strictPipelineFailedLogSchemainsrc/core/log-fields.tsmirrorsPipelineCompletedLogSchema, pinning the new fields so a rename or stray field trips the co-located test.Flow
Changes
src/core/log-fields.ts—StageTrackerinterface +createStageTracker();timeStagegains an optionaltrackerparam (set on entry, clear on success, retain on throw, re-throw unchanged); strictPipelineFailedLogSchema+PipelineFailedLog.src/core/pipeline.ts— create the tracker, thread it to the 7timeStagecalls +prompt.build, clear it in thefinalizeinner catch, and spread the two fields in the outer catch.test/core/log-fields.test.ts—PipelineFailedLogSchemadrift suite (valid; camelCase/extra-field/negative/wrong-event-literal/missing-required rejected) +timeStagetracker suite (clear-on-success, retain-on-throw, null-initial, no-tracker backward-compat).docs/operate/observability.md— pipeline.failed row documents the two new fields.Test plan
log-fieldstests pass; eachrunPipeline-exercising handler test (review/implement/resolve) passes in isolationdocs:buildcleanbun testfailure count is baseline-identical (no new failures); the only combined-run failures are pre-existing cross-file mock pollution the isolated CI runner avoidsCloses #226
Summary by CodeRabbit
Documentation
New Features